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Abstract 

Multiple Kernel Learning (MKL) is used to replicate the signal combination pro- 
cess that trading rules embody when they aggregate multiple sources of financial 
information when predicting an asset's price movements. A set of financially mo- 
tivated kernels is constructed for the EURUSD currency pair and is used to pre- 
dict the direction of price movement for the currency over multiple time horizons. 
MKL is shown to outperform each of the kernels individually in terms of predic- 
tive accuracy. Furthermore, the kernel weightings selected by MKL highlights 
which of the financial features represented by the kernels are the most informative 
for predictive tasks. Q 



1 Introduction 

A trader wishing to speculate on a currency's movement is most interested in what direction he 
believes the price of that currency P t will move over a time horizon At so that he can take a position 
based on this prediction. Any move that is predicted has to be significant enough to cross the 
difference between the buying price (bid) and selling price (ask) in the appropriate direction if the 
trader is to profit from it. If we view this as a three class classification task, then we can simplify this 
aim into an attempt to predict whether the trader should buy the currency pair because he believes 
P t+At > Pt LSk > se U ^ because P^-At < Pt %d or do nothing because P^£ t < P^ sk and P t +At > 

TjBid 
r t 

When making trading decisions such as whether to buy or sell a currency, traders typically combine 
the information from many models to create an overall trading rule (see for example [ 1 1). The aim of 
this work is to represent this model combination process through Multiple Kernel Learning, where 
individual kernels based on common trading signals are created to represent the constituent sources 
of information. 

There has been much work in using kernel based methods such as the SVM to predict the movement 
of financial time series, e.g. (3, 13, G), 0, ID, 0, 10 ,0, ED, 02, 03, 03, 01 and ifH . 
However the majority of the previous work in this area deals with the problem of kernel selection in 
a purely empirical manner with little to no theoretical justification, and makes no attempts to either 
use financially plausible kernels or indeed to combine kernels in any manner. 



'This work is closely related to a presentation titled Multiple Kernel Learning on the Limit Order Book 
given at the Workshop on Applications of Pattern Analysis 2010. 
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2 Financially Motivated Features 



2.1 Price-based Features 

The following four features are based on common price-based trading rules (which are described 
briefly in the Appendix): 

T x = (EMAf 1 , . . . , EMA\ N I 

T 2 = \MA^ MAf N , of 1 , . . . , of" 1 } 

y-3 = < F tl max f , • . • , max t , mm < , . . . , mm t > 

where EMAf i denotes an exponential moving average of the price P at time t with a half life Lj, 
erf i denotes the standard deviation of P over a period Li, MAf 1 its simple moving average over the 
period Lj, maxf ; and minf ; the maximum and minimum prices over the period and f|-f ! and ij-f i 
the number of price increases and decreases over it. 



2.2 Volume-based Features 



The majority of currency trading takes place on Electronic Communication Networks (ECNs). Con- 
tinuous trading takes place on these exchanges via the arrival of limit orders specifying whether the 
party wishes to buy or sell, the amount (volume) desired, and the price the transaction will occur at. 
While traders had previously been able to view the prices of the highest buy (best bid) and lowest 
sell orders (best ask), a relatively recent development in certain exchanges is the real-time revelation 
of the total volume of trades sitting on the ECN's order book at both these price levels and also at 
price levels above the best ask and below the best bid. This exposure of order books' previously 
hidden depths allows traders to capitalize on the greater dimensionality of data available to them 
when making trading decisions and suggests the use of kernel methods on this higher dimensional 
data. 

Representing the volume at time t at each of the price levels of the order book on both sides as 
a vector V t , where Vt € R 6 for the case of three price levels on each side, a further set of four 
features can be constructed: 

.F5...8 = {V t , jVTT^Vt - Vt-i, nv^v^iu } 
3 Experimental Design 

Radial Basis Function (RBF) and polynomial kernels have often been used in financial market pre- 
diction problems, e.g. [7 1 and [15|. Furthermore, Artificial Neural Networks (ANN) are often used 
in financial forecasting tasks (e.g. 1 16 1, [171 an d ifTHl ) and for this reason a kernel based on Williams 
(1998) [ 19| infinite neural network with a sigmoidal transfer function is also employed (see tCn-i5 
below). A feature mapping set consisting of 5 of each of these kernel types with different values of 
the relevant hyperparameter (a, d or S) along with the linear kernel is used: 

Ki sS = {cxp (- ||cc - cc'H 2 /cr^ ,...,exp(-\\x-x'f /af)} 

£6:10 - {«*, X') + if , . . . , ((X,X') + if 5 } 

( 2 . , / 2x T H l x' \ 2 . , / 2x T Y. b x' 

£1115 = < — sin — = — sin — = 

\tt \ y/(l + 2x T Z lX ){l + 2x' T Y, 1 x') ) tt y + 2x T Y, 5 x){l + 2x' T Y, 5 x') / 

)C 16 = {(x,x')} 
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Table 1 : Percentage of time predictions possible 



At 


SimpleMKL 


J-8/C-L6 




J1/C3 


5 


26.1 


24.7 


26.1 


24.7 


10 


41.1 


40.4 


39.8 


37.7 


20 


50.2 


49.1 


48.1 


45.0 


50 


46.3 


44.1 


44.8 


45.5 


100 


32.8 


33.5 


34.6 


35.3 


200 


27.0 


24.9 


26.6 


27.4 



This means that altogether there are J 7 ! x |/C = 8 x 16 = 128 feature / kernel combinations. 
We will adopt notation so that for example the combination is the moving average crossover 

feature with a RBF using the scale parameter a\. 

Three SVM are trained on the data with the following labeling criteria for each SVM: 

SVM l:P t +Ai > p t Ask Vt = + 1 - otherwise y] = -1 

SVM 2:P£% < P t md => y 2 t = +1, otherwise y t 2 = -1 

SVM 3:P t B + % < P t Ask ,P t A + % > Pt md => V? = +h otherwise y* = -1 

In this manner, a three dimensional output vector y t is constructed from y\, y\ and y\ for each 
instance such that y t = [±1, ±1, ±1]. Predictions are only kept for instances where exactly one 
of the signs in y t is positive, i.e. when all three of the classifiers are agreeing on a direction of 
movement. For this subset of the predictions, a prediction is deemed correct if it correctly predicts 
the direction of spread-crossing movement (i.e. upwards, downwards or no movement) and incorrect 
if not. 

The MKL method of SimpleMKL [20] is investigated along with standard SVM based on each of 
the 128 kernels / feature combinations individually. Predictions for time horizons (At) of 5, 10, 20, 
50, 100 and 200 seconds into the future are created. Training and prediction is carried out by training 
the three SVM on 100 instances of in sample data, making predictions regarding the following 100 
instances and then rolling forward 100 instances so that the out of sample data points in the previous 
window become the current window's in sample set. The data consists of 6 x 10 4 instances of order 
book updates for the EURUSD currency pair from the EBS exchange starting on 2/1 1/2009. 

4 Results and Conclusions 

When comparing the predictive accuracy of the kernel methods when used individually to their com- 
bination in MKL one needs to consider both how often each method was able to make a prediction 
as described above and how correct the predictions were overall for the whole dataset. In the tables 
and figures that follow, for the sake of clarity only three of the 128 individual kernels are used when 
comparing SimpleMKL to the individual kernels. 10-fold cross-validation was used to select the 
three kernels with the highest predictive accuracy for the dataset, namely J^/Cig, T\Kl\ and F1IC3. 

Table Q] which shows how often each of the methods were able to make a prediction for each of 
the time horizons, indicates that SimpleMKL was very similar in the frequency with which it was 
able to make predictions as the three individual kernel / feature combinations highlighted. Table 
|2] shows each of the method's predictive accuracy over the entire dataset when a prediction was 
actually possible. The results indicate that SimpleMKL has higher predictive accuracy than the 
most effective individual kernels for all time horizons under 200 seconds and is only marginally less 
effective than J-1/C3 for the 200 second forecast horizon. 

P-values for the null hypothesis that the results reported could have occurred by chance were cal- 
culated (the methodology for doing this is explained in the Appendix). It was found that for both 

2 EURUSD was selected as the currency pair to investigate because it is the world's most actively traded 
currency pair, comprising 27% of global turnover [21]. Consequently, the EBS exchange was selected for this 
analysis because it is the primary ECN for EURUSD. 
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Feature / Kernel Combination 



Figure 1: MKL Kernel weightings 



Table 2: Percentage accuracy of predictions 



At 


SimpleMKL 


-^8^16 




J1/C3 


5 


94.7 


94.7 


93.0 


92.8 


10 


89.9 


89.6 


88.4 


84.6 


20 


81.7 


81.3 


79.5 


72.3 


50 


67.1 


65.4 


65.5 


61.1 


100 


61.1 


51.1 


60.7 


59.9 


200 


58.9 


45.0 


58.8 


61.3 



SimpleMKL and the individual kernels highlighted for all forecast horizons, the null hypothesis 
could be rejected for a significance level of < 10~ 5 . 

As reflected in Figure 1, the kernel / feature combinations T\K,\, and T3K.5 are consistently 

awarded the highest weightings by SimpleMKL and hence are the most relevant for making predic- 
tions over the data set. These kernels are the RBF mapping with the smallest scale parameter on the 
exponential moving average crossover feature, the RBF mapping with the largest scale parameter on 
the price standard deviation / moving average feature and the RBF mapping with the largest scale 
parameter again on the minimums / maximums feature. 

The vertical banding of colour (or intensity) highlights the consistency of each of the kernel / feature 
combination's weightings across the different time horizons: in almost all cases the weighting for a 
particular combination is not significantly different between when being used to make a prediction 
for a short time horizon and a longer term one. One can also see from Figure 1 that although all 
8 of the features have weightings assigned to them, in most cases this is only in conjunction with 
the RBF kernels - the polynomial (Poly) and infinite neural network (ANN) based mappings being 
assigned weightings by MKL for only the fourth and fifth features. 

The most successful individual kernels as selected by cross-validation are awarded very low weights 
by SimpleMKL. This reflects a common feature of trading rules where individual signals can drasti- 
cally change their significance in terms of performance when used in combination. Furthermore, the 
outperformance of SimpleMKL to the individual kernels highlighted indicates that MKL is an ef- 
fective method for combining a set of price and volume based features in order to correctly forecast 
the direction of price movements in a manner similar to a trading rule. 
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Appendix 



Price-based Features 

• T\. A common trading rule is the moving average crossover technique (see for example 
Il22l0 which suggests that the price P t will move up when its short term moving average 
EMAf lort crosses above a longer term one EMA l ° ng and visa versa. 

• Ti'- Breakout trading rules (see for example ||23l ) look to see if the price has broken above 
or below a certain threshold and assume that once the price has broken through this thresh- 
old the direction of the price movement will persist. One way of defining this threshold 
is through the use of Bollinger Bands [24 1 where the upper/lower thresholds are set by 
adding/subtracting a certain number of standard deviations of the price movement a\ to 
the average price MA l L for a period L. 

• T%: Another breakout trading rule called the Donchian Trend system 11231 determines 
whether the price has risen above its maximum maxf or below its minimum minf over a 
period L and once again assumes that once the price has broken through this threshold the 
direction of the price movement will persist. 

• Tc The Relative Strength Index trading rule [25 1 is based on the premise that there is a 
relationship between the number of times the price has gone up over a period fff vs the 
number of times it has fallen \\ and assumes that the price is more likely to move upwards 
if frf and visa versa. 

Calculation of p-values 

• For each in sample period, the proportion of occurrences of each of the three classes of 
movement (up, down or none) over the 100 instances of in sample data was determined. 

• Predictions of movement were then generated randomly for each of the instances of the 
out of sample period where a prediction was deemed possible by SimpleMKL / individual 
kernel (as explained in section 3), each class having a probability of being assigned based 
on the in sample proportions. 

• This was repeated 10 5 times for each out of sample section with the number of times the 
randomly generated predictions were correct along with the number of times SimpleMKL 
/ individual kernel was correct for that period recorded each time. 

• The proportion of the 10 5 iterations that the number of correct predictions recorded for all 
the out of sample periods was greater than that reported by SimpleMKL / individual kernel 
was used to calculate the P-value. 

• In the work reported here, not one of the 10 5 iterations of randomly generated predictions 
outperformed the SimpleMKL / individual kernel methods. 
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